Methods and combinations for gene targeting by homologous recombination

ABSTRACT

The invention provides methods and compositions for inserting a DNA sequence in the genome of a cell by homologous recombination. In particular, the method utilizes a selection scheme in which a selection marker gene that encodes a fluorescence protein, such as a green fluorescence protein, is used for selection against random, non homologous insertions.

This application claims the benefit under 35 U.S.C. § 119(e) of U.S.Provisional Patent Application No. 60/325,450, filed on Sep. 27, 2001,which is incorporated by reference herein in its entirety.

1. FIELD OF THE INVENTION

The invention relates to methods and compositions for gene targeting byhomologous recombination. The invention also relates to DNA constructsthat can be used for gene targeting by homologous recombination.

2. BACKGROUND OF THE INVENTION

Understanding the biological function of mammalian genes remains one ofthe major challenges in the post genomic era. With the human genomesequenced, less than 20% of the estimated 30,000-50,000 genes (Venter etal, 2001 Science 291:5507; Lander, 2001, Nature 409:860) are wellcharacterized with their biological function known. Gene targeting byhomologous recombination is widely used for introducing insertions attargeted genomic loci.

A major problem in gene targeting by homologous recombination is theidentification and isolation of cells that have undergone homologousrecombination from among a large pool of cells that have undergonerandom, non-homologous recombination. To circumvent this problem, amethod utilizing a positive-negative selection scheme for homologousrecombination has been disclosed (see, e.g., U.S. Pat. Nos. 5,487,992;5,627,059; 5,631,153; and 6,204,061). The method makes use of a vectorcomprising four DNA sequences: a first DNA sequence which contains atleast one sequence portion which is substantially homologous to aportion of a first region of a target DNA sequence; a second DNAsequence containing at least one sequence portion which is substantiallyhomologous to another portion of a second region of a target DNAsequence; a third DNA sequence which is positioned between the first andsecond DNA sequences and encodes a positive selection marker which whenexpressed is functional in the target cell in which the vector is used;and a fourth DNA sequence encoding a negative selection marker, alsofunctional in the target cell, which is positioned 5′ to the first or 3′to the second DNA sequence and is substantially incapable of homologousrecombination with the target DNA sequence. In this method, transfectionof the cells with the vector produces two different types of cells, onecontaining random integration of the vector into the genome of the celland the other containing integration of the vector at the target genomiclocus by homologous recombination. Random integration leads to theinsertion of all four sequences into the genome, whereas homologousrecombination leads to the insertion of only the first through thirdsequences into the genome. Cells containing integration of the firstthrough third sequences by homologous recombination are selected bothpositively by way of the positive selection marker and negatively by wayof the negative selection marker. However, selection by way of anegative selection marker relies on the use of a selection agent that istoxic to the cells. Such selection may not always be available for alltypes of cells. Secondly, the method requires culturing the cells underboth the positive and negative selection conditions, and therefore, istime consuming. Furthermore, host cells may contain their own genes thatencode the negative selection marker, which may cause backgroundproblem.

U.S. Pat. No. 5,527,674 discloses a method for homologous recombinationusing a DNA construct comprising a positive selection marker and anegative selection system “antagonistic” to the expression of thepositive selection marker. The negative selection system is situatedoutside the homologous regions and comprises an antisense gene which,when expressed, prevents the expression of the positive selectionmarker. Cells that have undergone homologous recombination can thereforebe selected solely based on the presence of the positive selectionmarker activity. However, the method relies on, among others, a DNAconstruct design in which the promoter for the positive selection markermust be weaker than the promoter for the antisense gene for effectiveinhibition of the positive selection marker. This requirement of using aweak promoter for the positive selection marker significantly limits thechoice of promoters that can be used for efficient selection.

U.S. Pat. No. 6,284,541 discloses a method for homologous recombination.The method utilizes a cell surface marker for selection against randomintegrations. Selection for the absence of the negative selection markeris carried out by contacting the transfected cells with a bindingmolecule, e.g., a fluorescence-dye-tagged antibody, and identifying andisolating the cells using, e.g., a fluorescence activated cell sorter(FACS). Since the method relies on binding of a binding molecule to theselection marker expressed on the surface of the transfected cells,background due to non-specific binding may be significant. It is alsoknown that the sensitivity and resolution of a method based on stainingusing a fluorescence dye-labeled antibody can be low (see, e.g., Wang etal., 1994, Nature 639:400-403). Further, although this method does notrequire the use of a toxic agent for negative selection, it stillinvolves a separate step of contacting the transfected cells with one ormore gents, e.g., a primary antibody and a fluorescence dye-labeledsecondary antibody, therefore incurring further time and cost.

More efficient methods for gene targeting by homologous recombinationare desirable for large scale gene knockout and function analysis. Thereis therefore a need for methods that allow more efficient identificationand isolation of cells that have undergone homologous recombination froma large pool of cells that have undergone random, non-homologousrecombination. In particular, there is a need for methods that haveminimum background problem and require fewer rounds of separate steps.

Discussion or citation of a reference herein shall not be construed asan admission that such reference is prior art to the present invention.

3. SUMMARY OF THE INVENTION

The invention relates to methods and compositions for inserting a DNAsequence in the genome of cells of a cell type by homologousrecombination. The method of the invention utilizes a gene targetingvector comprising a sequence region that encodes a fluorescence protein,such as but not limited to a green fluorescence protein, located outsidethe homologous sequence regions for selection against random,non-homologous insertions.

The invention provides gene targeting vectors comprising sequencesencoding a positive selection marker for selection for integration ofall or portion of the gene targeting vector in the genome of the targetcells and at least one fluorescence marker for selection against randomintegration of the vector in the genome of the target cells. The genetargeting vector of the invention comprises four sequence regions: afirst sequence region comprising a nucleotide sequence which issubstantially homologous to a first target DNA sequence in the targetgenome; a second sequence region comprising a nucleotide sequence whichis substantially homologous to a second target DNA sequence in thetarget genome; a third sequence region positioned between the first andsecond DNA sequence regions and comprising a nucleotide sequence thatencodes a positive selection marker; and a fourth sequence regioncomprising a nucleotide sequence located at 5′ to the first or 3′ to thesecond sequence region encoding a fluorescence marker for selectionagainst random integration.

The positive selection marker gene can be any gene encoding a measurableand selectable marker in the type of cells, e.g., a type of mammaliancells, known in the art, including but not limited to, a drug resistancegene, such as but not limited to Neomycin/G418, Puromycin, Hygromycin B,Zeocin, or mycophenolic acid resistance gene; a gene encoding a cellsurface marker, such as but not limited to a gene encoding CD4, CD8,CD20, HA, or any synthetic or foreign cell surface marker; a geneencoding a fluorescent marker, such as but not limited to a geneencoding green fluorescence protein (GFP), blue fluorescence protein(BFP), red fluorescence protein (RFP), or any variants thereof; a geneencoding β-galactosidase; and a gene is a gene encoding β-geo. Thepositive selection marker gene can also encode a combination of morethan one positive selection marker, such as but not limited to a genethat encodes a rsGFP-neo fusion protein.

The third sequence region can also comprise regulatory sequencesregulating the expression of the positive selection marker. In oneembodiment, the third sequence region comprises a regulatory sequencecomprising a promoter, either regulated or constitutive, that regulatesthe expression of the positive selection marker gene. The regulatorysequences can also comprise other sequences that facilitate expressionof the positive selection marker, e.g., enhancers.

The third sequence region can further comprise any other sequences to beinserted into the genome of the target cells. In one embodiment, thethird sequence region comprises a regulated expression sequence portioncomprising a regulated promoter and a selection marker under the controlof the regulated promoter. The regulated promoter can be anytranscription regulation system known in the art for the type of cellschosen, including but not limited to a tetracycline regulated geneexpression system.

In embodiments in which a regulated expression sequence portion isincluded, the selection marker gene in the regulated expression sequenceportion can be any selection marker that can be expressed in the chosentype of cells, e.g., a chosen type of mammalian cells, known in the art,including but not limited to, drug resistance genes, such as but notlimited to Neomycin/G418, Puromycin, Hygromycin B, Zeocin, ormycophenolic acid resistance genes; cell surface marker genes, such asbut not limited to genes encoding CD4, CD8, CD20, HA, or any syntheticor foreign cell surface markers; genes encoding fluorescence markers,such as but not limited to genes encoding green fluorescence protein(GFP), blue fluorescence protein (BFP), red fluorescence protein (RFP),or any variants thereof. The selection marker expressed by the selectionmarker gene in the regulated expression portion can be the same as ordifferent from the positive selection marker. In a preferred embodiment,the selection marker expressed by the selection marker gene in theregulated expression portion is different from the positive selectionmarker.

The third sequence region of the gene targeting vector can still furthercomprise an optional rapid cloning element comprising a bacterialplasmid replication origin and a bacterial selection marker. Preferably,the replication origin sequence comprises all necessary sequences forinitiation of replication and segregation. Any bacterial plasmidreplication origin, such as but not limited to Ori, colEI, pSC101, pUC,or f1 phage ori, can be used. Any bacterial selection markers, such asbut not limited to, chloramphenicol, ampicillin, tetracycline, orkanamycin can be used in the present invention.

The fourth sequence region comprises a selection marker gene encoding afluorescence marker, e.g., a green fluorescence marker to permitfluorescence based selection against random integration of the genetargeting vector in the genome of the target cells. The fourth sequenceregion is located outside the homologous sequence regions, i.e., at 5′to the first or 3′ to the second sequence region. Fluorescent markersthat can be used in the present invention include, but are not limitedto, genes encoding green fluorescence protein (GFP), blue fluorescenceprotein (BFP), red fluorescence protein (RFP), or any variants thereof.When a fluorescence marker is used as the positive selection marker, itis preferable that the selection marker encoded in the fourth sequenceregion is a fluorescence marker that has distinguishable excitationand/or emission characteristics from the positive selection marker. In apreferred embodiment, the positive selection marker and the selectionmarker encoded in the fourth sequence region are one or the othercombination of rsGFP and BFP from Qbiogene (Carlsbad, Calif.).

The gene targeting vector can further comprise an optional fifthsequence region comprising a nucleotide sequence encoding a selectionmarker for selection against random integration, which is located at theopposite end of the gene targeting vector from the fourth sequenceregion, i.e., at 5′ to the first if the fourth sequence region islocated at the 3′ to the second sequence region, or at 3′ to the secondsequence region if the fourth sequence region is located at the 5′ tothe first sequence region. The selection marker encoded in the fifthsequence region can be a negative selection marker. Alternatively, theselection marker encoded in the fifth sequence region can be any one ofthe fluorescence markers. In embodiments in which the selection markerencoded in the fifth sequence region is a fluorescence marker, it can bethe same as or different from the fluorescence marker encoded in thefourth sequence region. When a fluorescence marker is used as thepositive selection marker, it is preferable that the selection markerencoded in the fifth sequence region is a fluorescence marker that hasdistinguishable excitation and/or emission characteristics from thepositive selection marker.

The invention provides methods for generating a plurality of cellscomprising cells that carry an insertion of a DNA sequence in the genomeby homologous recombination. The method of the invention comprisestransfecting cells of a chosen cell type with a gene argeting vector ofthe invention, e.g., a gene targeting vector comprising: a firstsequence region comprising a nucleotide sequence which is substantiallyhomologous to a first target NA sequence in the genome of cells of thechosen cell type; a second sequence region comprising a nucleotidesequence which is substantially homologous to a second target DNAsequence in the genome of cells of the chosen cell type; a thirdsequence region located between said first and second sequence regions,comprising a nucleotide sequence that encodes a positive selectionmarker; and a fourth sequence region comprising a nucleotide sequenceencoding a fluorescence marker, located at 5′ to said first or 3′ tosaid second sequence region, wherein said positive selection marker isexpressed in said cells that carry said insertion by homologousrecombination, and wherein said fluorescence marker encoded in saidfourth sequence region is not expressed in said cells that carry saidinsertion by homologous recombination.

In the methods of the invention, the plurality of cells comprising cellsthat carry an insertion of a DNA sequence in the genome by homologousrecombination can be selected by selecting for the presence of thepositive selection marker activity and the absence of the activity ofthe selection marker or markers encoded in those outside regions, i.e.,the fourth and/or the fifth sequence regions. In a preferred embodiment,a drug resistance gene is used as the positive selection marker. In thisembodiment, the selection for cells carrying the insertion of thepositive selection marker gene can be achieved by culturing thetransfected cells in the presence of the corresponding drug. In anotherpreferred embodiment, a fluorescence marker is used as the positiveselection marker. In this embodiment, the selection for cells carryingthe insertion of the positive selection marker gene can be achieved byany fluorescence based cell sorting methods known in the art, e.g., byFACS. The selection against random, non-homologous, integration of thegene targeting vector can be carried out by detecting the fluorescencefrom the fluorescence marker encoded in the fourth sequence region usingany fluorescence based cell sorting methods known in the art, e.g., byFACS. The step of selection against random, non-homologous, integrationof the gene targeting vector can be carried out before, concurrentlywith, or after the step of selection for the presence of the positiveselection marker. When a fluorescence based cell sorting method is usedfor selection for the presence of the positive selection marker and/oragainst the presence of the fluorescence markers encoded in the outsideregions, the fluorescence window is preferably set such that the cellsthat carry the insertion of the DNA sequence by homologous recombinationconstitute at least 10%, 30%, 50%, 70%, or 90% of the plurality ofcells.

Cells that are selected can be further characterized by any methodsknown in the art. In one embodiment, standard PCR and sequencingprocedures are used to characterize the cells. In another embodiment,cells are characterized by making use of the rapid cloning element. Inthis embodiment, genomic regions carrying the insertions arecharacterized by restriction digesting the rapid cloning element and itsflanking genomic DNA, recirculizing by DNA ligation, and transfectinginto bacterial cells. The plasmids isolated from transformed bacteriaare used to determine DNA sequence of the flanking genomic sequences byany DNA sequencing methods known in the art.

4. BRIEF DESCRIPTION OF FIGURES

FIG. 1 shows a schematic illustration of the method of the invention.

FIG. 2 shows exemplary configurations of gene targeting vectors of theinvention.

FIG. 3 shows the restriction map of gene targeting vector 1.

FIG. 4 shows the restriction map of gene targeting vector 2.

FIG. 5 shows the restriction map of gene targeting vector 3.

FIGS. 6A and B show sequences of homologous recombination region 1 (SEQID NO:1) and homologous recombination region 2 (SEQ ID NO:2) fortargeting the human TSG 101 gene.

5. DETAILED DESCRIPTION OF THE INVENTION

The invention provides methods and compositions for inserting a DNAsequence in the genome of cells of a cell type by homologousrecombination. The method of the invention utilizes a gene targetingvector comprising a sequence region that encodes a fluorescence protein,such as but not limited to a green fluorescence protein, located outsidethe homologous sequence regions, for selection against random,non-homologous insertions.

The method of the invention can be used to target any genomic sequencesin any cells, including but not limited to, any plant or animal cells,e.g., mammalian cells. Any cell type can be used in the presentinvention, including but not limited to, somatic cells and stem cells.

5.1. Gene Targeting Vectors

The invention provides gene targeting vectors comprising sequencesencoding a positive selection marker for selection for integration ofall or portion of the gene targeting vector in the genome of the targetcells and at least one fluorescence marker for selection against randomintegration of the vector in the genome of the target cells. The genetargeting vector of the invention comprises four sequence regions: afirst sequence region comprising a nucleotide sequence which issubstantially homologous to a first target DNA sequence in the targetgenome; a second sequence region comprising a nucleotide sequence whichis substantially homologous to a second target DNA sequence in thetarget genome; a third sequence region positioned between the first andsecond DNA sequence regions and comprising a nucleotide sequence thatencodes a positive selection marker; and a fourth sequence regioncomprising a nucleotide sequence located at 5′ to the first or 3′ to thesecond sequence region encoding a fluorescence marker for selectionagainst random integration. (See, e.g., FIGS. 3-5 for exemplary genetargeting vectors) The DNA construct can further comprise an optionalfifth sequence region comprising a nucleotide sequence encoding aselection marker for selection against random integration, which fifthsequence region is located at the opposite end of the gene targetingvector from the fourth sequence region, i.e., at 5′ to the first if thefourth sequence region is located at the 3′ to the second sequenceregion, or at 3′ to the second sequence region if the fourth sequenceregion is located at the 5′ to the first sequence region. When a cell istransfected with the gene targeting vector of the invention, homologousrecombination at the targeted genomic locus results in the integrationof the first through third sequence regions at the targeted locus andthe loss of the selection marker gene or genes located in the fourth andthe fifth, if applicable, sequence regions. Cells carrying an insertionat the targeted locus can therefore be identified by the presence of theactivity of the positive selection marker encoded by the third sequenceregion and the absence of fluorescence of the fluorescence protein orproteins encoded by the fourth and/or fifth sequence regions.

Each of the first and second sequence regions comprises a nucleotidesequence that is substantially homologous to a sequence at the targetgenomic locus. As used herein, “substantially homologous” refers to adegree of homology between the two DNA sequences that is at least 25%.Preferably, each of the homologous sequences is at least 20 bp, morepreferably at least 200 bp, still more preferably at least 1 kbp, andmost preferably at least 2.5 kbp in length. The degree of homologybetween each of the homologous sequences and the corresponding targetsequence is preferably at least 50%, more preferably t least 75%, stillmore preferably at least 90%, and most preferably 100%. Once a targetsequence region in the genome of a target cell is given, one skilled inthe art will be able to select homologous sequences that can be used intargeting the sequence region.

The third sequence region comprises a nucleotide sequence that encodes apositive selection marker. The positive selection marker gene can be anygene encoding a measurable and selectable marker in the type of cells,e.g., a type of mammalian cells, known in the art. In one embodiment,the positive selection marker gene is a gene encoding β-galactosidase.In another embodiment, the positive selection marker gene is a geneencoding β-geo. In still another embodiment, the positive selectionmarker gene is a drug resistance gene, such as but not limited toNeomycin/G418, Puromycin, Hygromycin B, Zeocin, or mycophenolic acidresistance gene. In still another embodiment, the positive selectionmarker gene is a gene encoding a cell surface marker, such as but notlimited to a gene encoding CD4, CD8, CD20, HA, or any synthetic orforeign cell surface marker. The positive selection marker gene can alsobe a gene encoding a fluorescent marker, such as but not limited to agene encoding green fluorescence protein (GFP), blue fluorescenceprotein (BFP), red fluorescence protein (RFP), or any variants thereof(see, e.g., Autofluorescent Proteins available athttp://www.qbiogene.com/protocols/gene-expression/m-afp.pdf (accessedSep. 5, 2001); Ellenberg et al., 1999, Trends in Cell Biol 9:52-56;Mizuno et al., 2001, Biochem. 40:2502-10; and Living Colors® UserManual, published Aug. 30, 2000, available athttp://www.clontech.com/techinfo/manuals/PDF/PT2040-1.pdf (accessed Sep.5, 2001)). In a preferred embodiment, the positive selection marker genecomprises a splicing acceptor at its 5′ end that allows fusion of thepositive selection marker gene to the RNA transcript from the upstreamexons (see, e.g., Li et al., 1996, Cell 85:319-329). The positiveselection marker gene can also encode a combination of more than onepositive selection marker. In one embodiment, the positive selectionmarker gene encodes a rsGFP-neo fusion protein (see, e.g.,Autofluorescent Proteins available athttp://www.qbiogene.com/protocols/gene-expression/m-afp.pdf). It will beapparent to one skilled in the art that any positive selection markergenes that are functionally equivalent to any of the positive selectionmarker gene as described, including any genes that are modified ormutated from any of the described positive selection marker genes, arealso within the scope of the present invention.

The third sequence region can also comprise regulatory sequencesregulating the expression of the positive selection marker. In oneembodiment, the third sequence region comprises a regulatory sequencecomprising a promoter that regulates the expression of the positiveselection marker gene. This is especially useful when the DNA constructis inserted at a genomic locus to activate an inactive endogenous gene.The regulatory sequences can also comprise other sequences thatfacilitate expression of the positive selection marker, e.g., enhancers.Any regulatory sequences, e.g., regulated or constitutive promoters,enhancers, etc., known in the art can be used. One skilled in the artwill be able to choose the appropriate regulatory sequences for thispurpose.

The third sequence region can also comprise any other sequences to beinserted into the genome of the target cells (see, e.g., Limin Li, U.S.Provisional Patent Application No. 60/325,497, filed on Sep. 27, 2001,which is incorporated herein by reference in its entirety). In oneembodiment, the third sequence region comprises a regulated expressionsequence portion comprising a regulated promoter and a selection markerunder the control of the regulated promoter. The regulated promoter canbe any transcription regulation system known in the art that can be usedin the chosen type of cells (see, e.g., Gossen et al, 1995, Science268:1766-1769; Lucas et al, 1992, Annu. Rev. Biochem. 61:1131; Li etal., 1996, Cell 85:319-329; Saez et al., 2000, Proc. Natl. Acad. Sci.USA 97:14512-14517; and Pollock et al., 2000, Proc. Natl. Acad. Sci. USA97:13221-13226). In one embodiment, a tetracycline regulated geneexpression system is used (see, e.g., Gossen et al, 1995, Science268:1766-1769). In another embodiment, an ecdysone regulated geneexpression system is used (see, e.g., Saez et al., 2000, Proc. Natl.Acad. Sci. USA 97:14512-14517). In still another embodiment, a MMTVglucocorticoid response element regulated gene expression system is used(see, e.g., Lucas et al, 1992, Annu. Rev. Biochem. 61:1131). Otherprotein or chemical regulated gene expression systems can also be used(see, e.g., Li et al., 1996, Cell 85:319-329).

The selection marker gene in the regulated expression sequence portioncan be any selection marker that can be expressed in the chosen type ofcells, e.g., a chosen type of mammalian cells, known in the art. In oneembodiment, a drug resistance gene is used as the selection marker. Drugresistance genes that can be used in the present invention include, butare not limited to, Neomycin/G418, Puromycin, Hygromycin B, Zeocin, ormycophenolic acid resistance genes. In another embodiment, a cellsurface marker is used as the selection marker. Cell surface markergenes that can be used in the present invention include, but are notlimited to, genes encoding CD4, CD8, CD20, HA, or any synthetic orforeign cell surface markers. In still another embodiment, afluorescence marker is used as the selection marker. Fluorescent markersthat can be used in the present invention include, but are not limitedto, genes encoding green fluorescence protein (GFP), blue fluorescenceprotein (BFP), red fluorescence protein (RFP), or any variants thereof(see, e.g., Autofluorescent Proteins available athttp://www.qbiogene.com/protocols/gene-expression/m-afp.pdf (accessedSep. 5, 2001); Ellenberg et al., 1999, Trends in Cell Biol 9:52-56;Mizuno et al., 2001, Biochem. 40:2502-10; and Living Colors® UserManual, published Aug. 30, 2000, available athttp://www.clontech.com/techinfo/manuals/PDF/PT2040-1.pdf (accessed Sep.5, 2001)). The selection marker expressed by the selection marker genein the regulated expression portion can be the same as or different fromthe positive selection marker. In a preferred embodiment, the selectionmarker gene expressed by the selection marker gene in the regulatedexpression portion is different from the positive selection marker.

In embodiments where a regulated expression sequence portion isincluded, the regulated expression sequence portion can be placed ineither orientation in relation to other components in the gene targetingvector. In a preferred embodiment, the regulated expression sequenceportion is oriented in the opposite orientation as the positiveselection marker. In such an embodiment, the regulated expressionsequence portion can be located either upstream or downstream of thepositive selection marker gene. In another embodiment, in which aregulatory sequence is included to activate the expression of thepositive selection marker gene, the regulated expression sequenceportion is oriented in the same orientation as the positive selectionmarker gene.

The third sequence region of the gene targeting vector can also comprisean optional rapid cloning element comprising a bacterial plasmidreplication origin and a bacterial selection marker. As used herein, a“rapid cloning element” refers to a nucleotide sequence which can beused to facilitate the cloning of the genomic sequences flanking theintegration site in a host, e.g., in a bacterial host. In the presentinvention, a rapid cloning element comprising a replication origin isoften used. As used herein, an “origin” or “replication origin” refersto a bacterial replication origin sequence. Preferably, the replicationorigin sequence comprises all necessary sequences for initiation ofreplication and segregation. Any bacterial plasmid replication origin,such as but not limited to Ori, colEI, pSC101, pUC, or f1 phage ori canbe used. Any bacterial selection markers, such as but not limited to,chloramphenicol, ampicillin, tetracycline, or kanamycin can be used inthe present invention. The rapid cloning element functions as aselection bacterial plasmid to allow efficient cloning of the genomicDNA sequences flanking it into bacterial cells.

The fourth sequence region comprises a selection marker gene encoding afluorescence marker, e.g., a green fluorescence marker. The fourthsequence region is located outside the homologous sequence regions,i.e., at 5′ to the first or 3′ to the second sequence region.Fluorescent markers that can be used in the present invention include,but re not limited to, genes encoding green fluorescence protein (GFP),blue fluorescence protein (BFP), red fluorescence protein (RFP), or anyvariants thereof (see, e.g., Autofluorescent Proteins available athttp://www.qbiogene.com/protocols/gene-expression/m-afp.pdf (accessedSep. 5, 2001); Ellenberg et al., 1999, Trends in Cell Biol 9:52-56;Mizuno et al., 2001, Biochem. 40:2502-10; and Living Colors® UserManual, published Aug. 30, 2000, available athttp://www.clontech.com/techinfo/manuals/PDF/PT2040-1.pdf (accessed Sep.5, 2001)). When a fluorescence marker is used as the positive selectionmarker, it is preferable that the selection marker encoded in the fourthsequence region is a fluorescence marker that has distinguishableexcitation and/or emission characteristics from the positive selectionmarker. In a preferred embodiment, the positive selection marker and theselection marker encoded in the fourth sequence region are one or theother combination of rsGFP and BFP from Qbiogene (Carlsbad, Calif.).

The gene targeting vector can optionally comprise a fifth sequenceregion comprising a selection marker gene for selection against random,non-homologous, recombination. The selection marker encoded by theselection marker gene in the fifth sequence region can be a negativeselection marker. Any negative selection marker known in the art can beused in the invention, including but not limited to HSV-tk, Hprt, andGpt. The selection marker encoded by the selection marker gene in thefifth sequence region can also be a fluorescence marker, which isdifferent from the fluorescence marker used as the positive selectionmarker, if a fluorescence marker is used as the positive selectionmarker. The fluorescence marker encoded by the fifth sequence region canbe the same as or different from the fluorescence marker encoded in thefourth sequence region. In one embodiment, the fluorescence markerencoded by the fifth sequence region is the same as the fluorescencemarker encoded in the fourth sequence region. In this embodiment, thepopulation of cells containing at least one of the fluorescence markersin their genomes is selected by detecting the fluorescence marker. Inanother embodiment, the fluorescence marker encoded by the fifthsequence region is different from the fluorescence marker encoded in thefourth sequence region. In a preferred embodiment, the fluorescencemarker encoded by the fifth sequence region has distinguishablydifferent emission and/or excitation wavelengths as compared to thefluorescence marker encoded in the fourth sequence region. In thisembodiment, the populations of cells containing different fluorescencemarkers in their genomes can be selected and separated by detecting thedifferent fluorescence markers. Fluorescent markers that can be used inthe present invention include, but are not limited to, genes encodinggreen fluorescence protein (GFP), blue fluorescence protein (BFP), redfluorescence protein (RFP), or any variants thereof (see, e.g.,Autofluorescent Proteins available athttp://www.qbiogene.com/protocols/gene-expression/m-afp.pdf (accessedSep. 5, 2001); Ellenberg et al., 1999, Trends in Cell Biol 9:52-56;Mizuno et al., 2001, Biochem. 40:2502-10; and Living Colors® UserManual, published Aug. 30, 2000, available athttp://www.clontech.com/techinfo/manuals/PDF/PT2040-1.pdf (accessed Sep.5, 2001)). The fifth sequence region is located at the opposite end ofthe gene targeting vector from the fourth sequence region, i.e., at 5′to the first if the fourth sequence region is located at the 3′ to thesecond sequence region, or at 3′ to the second sequence region if thefourth sequence region is located at the 5′ to the first sequenceregion. The inclusion of the fifth sequence region comprising anotherselection marker for selection against random integration is useful inenhancing selection against random insertions in which all or part ofthe selection marker encoded in the fourth sequence region is excisedbefore random insertion occurs.

Depending on the particular gene targeting vector used, additionalsequences may be necessary for inclusion in the vector. For example, thegene targeting vector may contain restriction sites to facilitate themanipulation of the vector. The gene targeting vector may also containsequences that aid the integration of the vector into the host genome.Such sequences and the manner of their inclusion in the vector are wellwithin the knowledge of anyone skilled in the art and will be apparentto anyone skilled in the art when a particular vector is chosen.

5.2. Methods for Identification and Isolation of Cells

The gene targeting vectors can be introduced into mammalian cells by anyDNA transfection methods known in the art, such as microinjection,electroporation and LIPOFECTAMINE.

The transfection of the cells using the gene targeting vector can resultin two types of insertion events: insertion by homologous recombinationat the target genomic locus and random insertion of the gene targetingvector in the genome. Insertion by homologous recombination at thetarget locus leads to the integration of the nucleotide sequence betweenthe first and second sequence regions, i.e., the homologous sequences,into the target genome and the excision of any sequence(s) outside thehomologous sequence regions, i.e., 5′ of the first sequence region and3′ of the second sequence region. Therefore, cells that have undergonehomologous recombination can be identified by the presence of thepositive selection marker activity and the absence of the activity ofthe selection marker or markers encoded in those outside regions, i.e.,the fourth and/or the fifth sequence regions. Random insertion of thegene targeting vector in the host genome, on the other hand, leads tothe integration of the entire vector into the genome. Cells that haveundergone random insertion can therefore be identified by the presenceof both the positive selection marker and the activity of the selectionmarker or markers encoded in those outside regions. The gene targetingvector of the invention can be integrated into the genome of transfectedcells in two configurations. In one embodiment, the gene targetingvector integrates behind a chromosomal promoter. In this embodiment, thepositive selection marker gene is turned on by the chromosomal promoter.Integration of the gene targeting vector results in disruption oftranscription at the allele. In another embodiment, the gene targetingvector integrates upstream of an inactive or active chromosomalpromoter. In this embodiment, integration of the gene targeting vectoractivates the inactive chromosomal promoter or amplify the activechromosomal promoter. This embodiment allows activation of chromosomalgenes in cells to screen for any phenotypic changes associated to theactivated gene.

The selection for the presence of the positive selection marker can becarried out by standard methods known in the art, depending on thepositive selection marker used. For example, in one preferredembodiment, a drug resistance gene is used as the positive selectionmarker. In this embodiment, the selection for cells carrying theinsertion of the positive selection marker gene can be achieved byculturing the transfected cells in the presence of the correspondingdrug. The optimal conditions for selection for insertion of the positiveselection marker gene, e.g., concentration of the drug, duration ofculturing, etc., can be determined by one skilled in the art once theparticular gene is chosen. In another preferred embodiment, afluorescence marker is used as the positive selection marker. In thisembodiment, the selection for cells carrying the insertion of thepositive selection marker gene can be achieved by any fluorescence basedcell sorting methods known in the art. For example, the selection can becarried out using a FACS system. Any FACS system can be used in thepresent invention. Preferably, a FACS system equipped with multipleexcitation lasers is used to permit concurrent selection of both thepositive selection marker and the fluorescence marker encoded in thefourth sequence region. One skilled in the art will be able to determinethe parameters for the FACS scan, e.g., excitation/emission wavelengths,widths of fluorescence windows, etc., once the fluorescence marker ischosen. Preferably, the fluorescence window is set such that at least10% of the sorted cells from the initial cell population are cellshaving the positive selection marker integrated in the their genomes.More preferably, the fluorescence window is set such that at least 50%of the sorted cells from the initial cell population are cells havingthe positive selection marker integrated in the their genomes. Stillmore preferably, the fluorescence window is set such that at least 70%of the sorted cells from the initial cell population are cells havingthe positive selection marker integrated in the their genomes. Mostpreferably, the fluorescence window is set such that at least 90% of thesorted cells from the initial cell population are cells having thepositive selection marker integrated in the their genomes.

The selection against random, non-homologous integration of the genetargeting vector can be carried out by selecting cells that do not carrythe insertion of the fluorescence marker gene encoded in the fourthsequence region of the gene targeting vector. The selection can beachieved using any fluorescence based cell sorting methods known in theart. The step of selection against random, non-homologous integration ofthe gene targeting vector can be carried out before, concurrently with,or after the step of selection for the presence of the positiveselection marker. Depending on the combination of the positive selectionmarker and the fluorescence marker encoded by a DNA sequence in thefourth sequence region, it will be apparent to one skilled in the art todetermine the optimal sequence of the two steps of selections. In apreferred embodiment, when a drug resistance gene is used as thepositive selection marker, the step of selection against random,non-homologous, integration is carried out after the step of selectionfor the presence of the positive selection marker. In another preferredembodiment, when a gene encoding a fluorescence marker is used as thepositive selection marker, the step of selection against random,non-homologous, integration can be carried out concurrently with thestep of selection for the presence of the positive selection marker.

In one embodiment, the step of selection against random, non-homologousintegration is carried out using a standard FACS system. Any FACS systemcan be used in the present invention. One skilled in the art will beable to determine the parameters for the FACS machine, e.g.,excitation/emission wavelengths, fluorescence windows, etc., once thefluorescence marker is chosen. Preferably, the fluorescence window isset such that at least 10% of the sorted cells from the initial cellpopulation are cells that do not carry the insertion of the fluorescencemarker gene encoded in the fourth sequence region of the gene targetingvector. More preferably, the fluorescence window is set such that atleast 30%, 50%, 70%, or 90% of the sorted cells from the initial cellpopulation are cells that do not carry the insertion of the fluorescencemarker gene encoded in the fourth sequence region of the gene targetingvector.

Cells that are selected can be characterized by standard methods knownin the art. In ne embodiment, standard PCR and sequencing procedures areused to characterize the ells.

In another embodiment, cells are characterized by making use of therapid cloning element. In this embodiment, homozygous mutations arecharacterized by the following steps: first, the rapid cloning elementand its flanking genomic DNA are linerized by a single or two compatiblerestriction enzymes, then recirculized by DNA ligation, and transfectedinto bacterium. The plasmids isolated from transformed bacteria are usedto determine DNA sequence of the flanking exons by any DNA sequencingmethods known in the art.

6. EXAMPLES

The following examples are presented by way of illustration of thepresent invention, and are not intended to limit the present inventionin any way. In particular, the examples presented hereinbelow describeinsertion of pGT-neo/GFP/BFP and pGT-GFP/BFP in the TSG101 locus of thegenome of human fibroblast cell line CLL212 (ATCC). This cell line waseither transfected with a pTet-off or pTet-On expression vector(Clontech), clones that have the optional expression of transactivator(either TetR or rTetR) were identified by their ability to transactivatea Tet response vector that expresses a detectable markerbeta-galactosidase (Clontech). This modified cell line is designated asCLL212-Trans.

Gene targeting vector depicted in FIG. 4 was constructed as follows: aneo fragment from pSV2neo (Clontech) was inserted into a tetracyclineregulated expression vector pUHD 10-3(http://www.zmbh.uniheidelberg.de/bujard/homepage.html, accessed Sep.20, 2001) to give pTet-neo. An sgGFP expression cassette and a sgBFPexpression cassette were inserted into pTet-neo as shown in FIG. 4 togenerate pGT-neo/GFP/BFP.

To target the TSG101 locus, a 4 kb region of TSG101 gene that spansexons 4-6 was chosen (GENEBANK® accession no. NT_(—)009307.5). This 4 kbfragment was divided into homologous recombination region 1 (SEQ IDNO:1) and homologous recombination region 2 (SEQ ID NO:2), each regionhas about 2 kb in length (see FIGS. 6A-B). Homologous recombinationregion 1 was inserted into pGT-neo/GFP/BFG at a Hind III site, andhomologous recombination region 2 was inserted at an EcoR I site to givepGT-neo/GFP/BFP-TSG101. CLL212-trans cells were transfected with thegene targeting vector (pGT-neo/GFP/BFP-TSG101) by electroporation (Li etal., 1996, Cell 85:319-329). Transfected cells are first cultured for 24to 48 hours and then further cultured in the presence of G418 (400ug/ml) for 7-10 days. G418 resistance clones were screened under afluorescence microscope for the expression of GFP and BFP. G418resistance clones that did not express any of the GFP and BFP wereisolated and expanded into cell lines. These clones were confirmed tohave undergone the desired homologous recombination at the TSG101 locusby genomic Southern blotting analysis and PCR analysis. Western blottingusing a rabbit anti-TSG101 antibody (CLONETECH, see also Li et al.,Proc. Natl. Acad. Sci. USA, 98:1619-24) further confirmed theinactivation of TSG101 protein production.

Gene targeting vector depicted in FIG. 5 was constructed as follows.Briefly, sgGFP fragment(http://www.qbiogene.com/protocols/gene-expression/m-afp.pdf (accessedSep. 5, 2001) was inserted into a tetracycline regulated expressionvector pUHD 10-3 (http://www.zmbh.uniheidelberg.de/bujard/homepage.html,accessed Sep. 20, 2001) to generate pTet-GFP. An sgBFP expressioncassette was inserted into pTet-GFP as shown in FIG. 5 to generatepGT-GFP/BFP. To target the TSG101 locus, a 4 kb region of TSG101 genethat spans exons 4-6 was chosen (GENEBANK® accession no.NT_(—)009307.5). This 4 kb fragment was divided into homologousrecombination region 1 (SEQ ID NO:1) and homologous recombination region2 (SEQ ID NO:2), each region has about 2 kb in length (see FIGS. 6A-B).Homologous recombination region 1 was inserted into pGT-neo/GFP/BFG at aHind III site, and homologous recombination region 2 was inserted at anEcoR I site to give pGT-GFP/BFP-TSG111. CLL212-trans cells Cells weretransfected with the gene targeting vector (pGT-GFP/BFP-TSG101) byelectroporation (Li et al., 1996, Cell 85:319-329). Transfected cellswere cultured for 24 to 48 hours. The cell cultures were thentrypsinized. Cells were analyzed by FACS. Only cells that expressed GFPbut did not express BFP were sorted from the population. The sortedcells were expanded into cell lines. These clones were confirmed to haveundergone the desired homologous recombination at the TSG101 locus bygenomic Southern blotting analysis and PCR analysis. Western blottingusing a rabbit anti-TSG101 antibody (CLONETECH, see also Li et al.,Proc. Natl. Acad. Sci. USA, 98:1619-24) further confirmed theinactivation of TSG101 protein production.

7. References Cited

All references cited herein are incorporated herein by reference intheir entirety and for all purposes to the same extent as if eachindividual publication or patent or patent application was specificallyand individually indicated to be incorporated by reference in itsentirety for all purposes.

Many modifications and variations of the present invention can be madewithout departing from its spirit and scope, as will be apparent tothose skilled in the art. The specific embodiments described herein areoffered by way of example only, and the invention is to be limited onlyby the terms of the appended claims along with the full scope ofequivalents to which such claims are entitled.

1. A method for generating a plurality of cells comprising cells thatcarry an insertion of a DNA sequence in the genome by homologousrecombination, said method comprising transfecting cells of a cell typewith a gene targeting vector comprising: (a) a first sequence regioncomprising a nucleotide sequence which is substantially homologous to afirst target DNA sequence in the genome of cells of said cell type; (b)a second sequence region comprising a nucleotide sequence which issubstantially homologous to a second target DNA sequence in the genomeof cells of said cell type; (c) a third sequence region located betweensaid first and second sequence regions, comprising a nucleotide sequencethat encodes a positive selection marker; and (d) a fourth sequenceregion comprising a nucleotide sequence encoding a fluorescence marker,located at 5′ to said first or 3′ to said second sequence region,wherein said positive selection marker is expressed in said cells thatcarry said insertion by homologous recombination, and wherein saidfluorescence marker encoded in said fourth sequence region is notexpressed in said cells that carry said insertion by homologousrecombination.
 2. The method of claim 1, wherein said gene targetingvector further comprises a fifth sequence region comprising a DNAsequence encoding a selection marker, wherein said fifth sequence regionis located at 5′ to said first sequence region if said fourth sequenceregion is located at the 3′ to said second sequence region or at 3′ tosaid second sequence region if said fourth sequence region is located atthe 5′ to said first sequence region.
 3. The method of claim 1, furthercomprising the step of selecting said cells that carry said insertion byhomologous recombination.
 4. The method of claim 3, wherein said step ofselecting comprising (a) selecting cells wherein said positive selectionmarker is expressed; and (b) selecting cells wherein said fluorescencemarker encoded in said fourth sequence region is not expressed.
 5. Themethod of claim 4, wherein said step (b) is carried out after said step(a).
 6. The method of claim 5, wherein said step (b) is carried out by afluorescence activated cell sorter.
 7. The method of claim 1, 2, or 3,wherein said positive selection marker gene is a gene selected from thegroup consisting of a drug resistance gene, a gene encoding a surfacemarker, a gene encoding a fluorescence marker, a gene encodingβ-galactosidase, and a gene encoding β-geo.
 8. The method of claim 5,wherein said positive selection marker gene is a drug resistance gene.9. The method of claim 8, wherein said drug resistance gene is selectedfrom the group consisting of a Neomycin/G418 resistance gene, aPuromycin resistance gene, a Hygromycin B resistance gene, a Zeocinresistance gene, and a mycophenolic acid resistance gene.
 10. The methodof claim 4, wherein said positive selection marker gene is a geneencoding a fluorescence marker.
 11. The method of claim 10, wherein saidgene encoding a fluorescence marker is selected from the groupconsisting of a gene encoding a green fluorescence marker, a geneencoding a blue fluorescence marker, and a gene encoding a redfluorescence marker.
 12. The method of claim 10 or 11, wherein said step(a) is carried out by a fluorescence activated cell sorter.
 13. Themethod of claim 12, wherein said step (a) and step (b) are carried outconcurrently.
 14. The method of 13, wherein said step of selection iscarried out such that said cells that carry said insertion by homologousrecombination constitute at least 10% of said plurality of cells. 15.The method of claim 14, wherein said step of selection is carried outsuch that said cells that carry said insertion by homologousrecombination constitute at least 30% of said plurality of cells. 16.The method of claim 15, wherein said step of selection is carried outsuch that said cells that carry said insertion by homologousrecombination constitute at least 50% of said plurality of cells. 17.The method of claim 16, wherein said step of selection is carried outsuch that said cells that carry said insertion by homologousrecombination constitute at least 70% of said plurality of cells. 18.The method of claim 17, wherein said step of selection is carried outsuch that said cells that carry said insertion by homologousrecombination constitute at least 90% of said plurality of cells. 19.The method of any one of claims 3-6 and 8-18, wherein said genetargeting vector further comprises a fifth sequence region comprising aDNA sequence encoding a selection marker, wherein said fifth sequenceregion is located at 5′ to said first sequence region if said fourthsequence region is located at the 3′ to said second sequence region orat 3′ to said second sequence region if said fourth sequence region islocated at the 5′ to said first sequence region, and wherein said methodfurther comprises a step of selecting cells wherein said selectionmarker encoded in said fifth sequence region is not expressed.
 20. Themethod of claim 19, wherein said selection marker encoded in said fifthsequence region is a fluorescence marker.
 21. The method of claim 4 or5, wherein said positive selection marker gene is a gene encoding asurface marker.
 22. The method of claim 4 or 5, wherein said positiveselection marker gene is a gene encoding β-galactosidase.
 23. The methodof claim 4 or 5, wherein said positive selection marker gene is a geneencoding β-geo.
 24. The method of any one of claims 1-6, wherein saidpositive selection marker gene is a gene encoding a combination of morethan one selection markers.
 25. The method of claim 24, wherein saidgene encoding a combination of more than one selection markers encodes arsGFP-neo fusion protein.
 26. The method of claim 24, wherein said genetargeting vector further comprises a fifth sequence region comprising aDNA sequence encoding a selection marker, wherein said fifth sequenceregion is located at 5′ to said first sequence region if said fourthsequence region is located at the 3′ to said second sequence region orat 3′ to said second sequence region if said fourth sequence region islocated at the 5′ to said first sequence region.
 27. The method of 5 or6, wherein said step (b) is carried out such that at least 10% of thesorted cells from the initial cell population are cells that do notcarry the insertion of the fluorescence marker gene encoded in thefourth sequence region of the gene targeting vector.
 28. The method ofclaim 27, wherein said step (b) is carried out such that at least 30% ofthe sorted cells from the initial cell population are cells that do notcarry the insertion of the fluorescence marker gene encoded in thefourth sequence region of the gene targeting vector.
 29. The method ofclaim 28, wherein said step (b) is carried out such that at least 50% ofthe sorted cells from the initial cell population are cells that do notcarry the insertion of the fluorescence marker gene encoded in thefourth sequence region of the gene targeting vector.
 30. The method ofclaim 29, wherein said step (b) is carried out such that at least 70% ofthe sorted cells from the initial cell population are cells that do notcarry the insertion of the fluorescence marker gene encoded in thefourth sequence region of the gene targeting vector.
 31. The method ofclaim 30, wherein said step (b) is carried out such that at least 90% ofthe sorted cells from the initial cell population are cells that do notcarry the insertion of the fluorescence marker gene encoded in thefourth sequence region of the gene targeting vector.
 32. A genetargeting vector for inserting a DNA sequence in the genome of cells ofa cell type, comprising (a) a first sequence region comprising anucleotide sequence which is substantially homologous to a first targetDNA sequence in the genome of cells of said cell type; (b) a secondsequence region comprising a nucleotide sequence which is substantiallyhomologous to a second target DNA sequence in the genome of cells ofsaid cell type; (c) a third sequence region located between said firstand second sequence regions, comprising a nucleotide sequence thatencodes a positive selection marker; and (d) a fourth sequence regioncomprising a nucleotide sequence encoding a fluorescence marker, locatedat 5′ to said first or 3′ to said second sequence region, wherein saidpositive selection marker is expressed in said cells if said nucleotidesequence encoding said positive selection marker is integrated in thegenome of said cells, and wherein said fluorescence marker is expressedin said cells if said nucleotide sequence encoding said fluorescencemarker is integrated in the genome of said cells.
 33. The gene targetingvector of claim 32, wherein said positive selection marker gene is adrug resistance gene.
 34. The gene targeting vector of claim 33, whereinsaid drug resistance gene is selected from the group consisting of aNeomycin/G418 resistance gene, a Puromycin resistance gene, a HygromycinB resistance gene, a Zeocin resistance gene, and a mycophenolic acidresistance gene.
 35. The gene targeting vector of claim 32, wherein saidpositive selection marker gene is a gene encoding a fluorescence marker.36. The gene targeting vector of claim 35, wherein said gene encoding afluorescence marker is selected from the group consisting of a geneencoding a green fluorescence marker, a gene encoding a bluefluorescence marker, and a gene encoding a red fluorescence marker. 37.The gene targeting vector of claim 32, wherein said positive selectionmarker gene is a gene encoding a surface marker.
 38. The gene targetingvector of claim 32, wherein said positive selection marker ene is a geneencoding β-galactosidase.
 39. The gene targeting vector of claim 32,wherein said positive selection marker gene is a gene encoding β-geo.40. The gene targeting vector of claim 32, wherein said positiveselection marker gene is a gene encoding a combination of more than oneselection markers.
 41. The gene targeting vector of claim 40, whereinsaid gene encoding a combination of more than one selection markersencodes a rsGFP-neo fusion protein.
 42. The gene targeting vector of anyone of claims 32-41, wherein said gene encoding a fluorescence marker isselected from the group consisting of a gene encoding a greenfluorescence marker, a gene encoding a blue fluorescence marker, and agene encoding a red fluorescence marker.
 43. The gene targeting vectorof any one of claims 32-41, further comprising a fifth sequence regioncomprising a DNA sequence encoding a selection marker, wherein saidfifth sequence region is located at 5′ to said first sequence region ifsaid fourth sequence region is located at the 3′ to said second sequenceregion or at 3′ to said second sequence region if said fourth sequenceregion is located at the 5′ to said first sequence region.
 44. Themethod of claim 43, wherein said selection marker encoded in said fifthsequence region is a fluorescence marker.
 45. The method of claim 44,wherein said fluorescence marker is the same as said fluorescence markerencoded in the fourth sequence region.
 46. The method of claim 44,wherein said fluorescence marker is different from said fluorescencemarker encoded in the fourth sequence region.